Wan AI Video Generator
Wan AI is an advanced and powerful visual generation model developed by Tongyi Lab. It can generate videos based on text, images and other control signals. The Wan 2.2 series models are now fully open-source after Wan 2.1.
Wan Video AI Video Generator
Wan 2.1
Open SourceAdvanced open-source video generation model with exceptional quality and versatility. Perfect for professional content creation.
Text to Video Example
See how Wan 2.1 transforms text into stunning videos
A couple in formal evening attire is caught in heavy rain on their way home, holding a black umbrella. In the flat shot, the man is wearing a black suit and the woman is wearing a white long dress. They walk slowly in the rain, and the rain drips down the umbrella. The camera moves smoothly with their steps, showing their elegant posture in the rain.
Key Features
- ✓High-quality video generation
- ✓Text-to-video & Image-to-video
- ✓Open source availability
Wan 2.2
Open SourceExperience the next generation of Wan AI video generator with enhanced quality, precise control, and creative possibilities.
Wan AI Video Generation
Key Features
Advanced Control
Precise control over video generation
High Performance
Optimized processing speed
Quality Output
Superior video quality
Versatile Input
Multiple input types
Wan 2.5
An AI generation tool with native multimodal architecture, featuring core breakthroughs in "10-second audio-visual synchronization + 4K cinematic quality" that transcends the previous generation from "pure image generation" to "end-to-end audio-visual collaborative creation," balancing practical scenario adaptation and creative precision.
Audio-Visual Creation
4K cinematic quality
10sKey Features
Audio-Visual Sync
Native synchronization with accurate lip-sync across languages
4K Cinematic
10s 1080P/4K HD at 24fps with rich lighting
Camera Control
Advanced prompt adherence with complex camera movements
Multimodal Input
Text/image-to-video with conversational editing
Wan 2.2 Fun Control
Enhanced control and creative freedom with the latest Wan AI technology. Experience unprecedented precision in video generation.
Generation Example
Advanced motion control and style transfer
Real-time
Reference Character
InputReference Motion
InputGenerated Result
OutputCombining character style with reference motion to create personalized video content.
Advanced Features
- ✓Advanced Control
- ✓Improved Video Quality
- ✓Enhanced Creative Options
Wan 2.2 Animate
Combine static images with reference videos to generate dynamic animated videos with advanced motion control and smooth transitions.
Animation Example
Image + Reference Video to animated video

Input Image
InputReference Video
InputGenerated Result
OutputCombine image and reference video to generate dynamic animated videos with smooth motion.
Key Features
- ✓Image + Video to video animation
- ✓Reference video motion transfer
- ✓Smooth motion control
Wan Video LoRA
Specialized video adaptation using Wan AI LoRA technology. Create unique and personalized video styles with minimal training.
Specialized Features
- ✓Custom style adaptation
- ✓Fast fine-tuning capabilities
- ✓Efficient resource usage
- ✓Advanced style transfer
Wan Image AI Image Generator
Qwen Text-to-Image
AI-Powered Image Generation
Natural Language Understanding
Generate images from natural descriptions in Chinese or English, supporting classical poetry to modern expressions
High-Definition Output
Ultra-detailed rendering with exceptional clarity, perfect for professional content creation
Style Control
Precise style control with simple keywords, from anime to photorealistic rendering

Example Output
Generated from natural language description
Qwen Image Edit
Precise Image Editing & Enhancement
Key Features
Smart Text Editing
Intelligent font matching and style preservation for text modifications
Object Replacement
Seamless object swapping with automatic lighting and reflection adjustment
Effect Generation
Add professional visual effects with simple brush strokes
Draw to Image Workflow
Select Area
Circle or mark region
Draw Input
Sketch your changes
Describe
Add text instructions
Overview of Wan AI
SOTA Performance
Wan AI consistently outperforms leading open-source models and commercial video solutions across multiple industry benchmarks.
Consumer-GPU Optimized
The Wan AI Video T2V-1.3B model requires only 8.19 GB VRAM, enabling smooth operation on mainstream consumer GPUs. It generates 5-second 480P videos in approximately 4 minutes on an RTX 4090 (without quantization), delivering performance comparable to proprietary models.
Multimodal Capabilities
Wan AI delivers exceptional results in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio tasks, redefining intelligent video generation.
Visual Text Rendering
Wan Video introduces the first cross-lingual text generation engine for videos, supporting both Chinese and English with production-ready typography integration.
Advanced Wan-VAE Architecture
Wan-VAE achieves breakthrough efficiency in 1080P video encoding/decoding at any duration while maintaining temporal coherence—forming the core foundation for next-generation video generation systems.
Text-to-Image Generation
Wan AI's native multi-modal architecture supports text-to-image generation, empowering users to directly create high-fidelity images from descriptions for diverse creative needs.
Advanced Image Editing & Composition
Wan Image excels in sophisticated editing tasks, including modifying text within images and seamlessly composing or fusing multiple pictures. It maintains high subject consistency and produces Asian portraits with enhanced realism, ensuring outputs meet commercial-grade standards.

Features of Wan AI
Wan Video Features
Complex Motion Generation
Wan Video models excel at generating realistic videos with large-scale body movements, complex rotations, dynamic scene transitions, and smooth cinematic camera motions. Advanced versions further enhance multi-character interaction and long-sequence motion consistency.
Realistic Physical Simulation
Wan AI accurately simulates real-world physics, including object collisions, gravity, fluid dynamics, and material interactions. Higher-tier models deliver more precise environmental responses and physically consistent animations.
Cinematic Visual Quality
Wan AI Video offers film-level visual quality with rich textures, natural lighting, depth-of-field effects, and multiple cinematic styles. Professional models unlock advanced visual effects, color grading, and stylized cinematic rendering.
Controllable Video Editing
Wan AI provides a universal video editing framework with precise controllability using image or video references. Different model versions support object replacement, motion transfer, scene restructuring, and temporal consistency editing.
Visual Text & Dynamic Typography
Wan Video can generate static and dynamic text effects directly inside videos from text prompts. Advanced models support bilingual (Chinese & English) typography, animated captions, and creative text motion effects for advertising and media production.
Wan Image Features
High-Precision Image Generation
Wan Image generates high-resolution images with accurate structure, detailed textures, and realistic lighting. Different versions support 2K–4K output, ultra-detailed realism, and artistic illustration styles.
Advanced Image Editing & Inpainting
Wan Image supports precise inpainting, object removal, detail enhancement, and content replacement. Professional versions enable pixel-level refinement and complex region-aware editing.
Style Transfer & Visual Control
Wan Image enables multi-style rendering, including realism, anime, 3D, watercolor, oil painting, and cyberpunk. Advanced models support fine-grained style strength control and cross-style fusion.
Outpainting & Image Expansion
Wan Image allows seamless image expansion beyond original boundaries while maintaining visual consistency. Higher-end models support wide-format expansion for banners, posters, and commercial layouts.
ArtAny AI & Wan AI Product Features
ArtAny AI seamlessly integrates Wan AI's powerful video and image models into a unified, user-friendly creative platform. With just a few clicks, users can generate, edit, and enhance videos, images, and audio content for marketing, social media, advertising, and professional production.
Wan AI Text to Video
Transform simple text prompts into high-quality cinematic videos with dynamic motion, realistic physics, and multiple visual styles powered by Wan Video.
Wan Image to Video
Wan AI Animate static images into vivid motion videos with smooth transitions, camera movement, and character animation using Wan Video technology.
Start & End Frame Control
Precisely control the opening and closing frames of your video to ensure visual consistency, smooth transitions, and stronger storytelling.
Wan AI Text to Image
Generate high-resolution images from text prompts with ultra-detailed realism, artistic illustration styles, and full creative control powered by Wan Image.
Image Editing & Enhancement
Wan AI Edit images with powerful tools including inpainting, object removal, background replacement, style transfer, and outpainting for professional-grade visual design.
Video-to-Audio & AI Voice
Generate background music, sound effects, and AI voiceovers directly from videos or scripts, enabling synchronized audio-visual production in one workflow.
Wan AI Video Editing & Visual Effects
Enhance videos with intelligent editing features such as object replacement, motion transfer, cinematic color grading, and stylized visual effects.
Wan AI Open Source Release
Alibaba has officially announced the community open-sourcing of the code and weights for both the Wan 2.1 and Wan 2.2 versions via this repository. Wan AI is a comprehensive and open suite of video foundation models, specifically designed to push the boundaries of video generation and empower the developer and research communities.
Wan 2.2 Open-Source-Modelle
Wan2.2 represents a major upgrade to the Wan video foundation models, delivering significant improvements in architecture, visual quality, motion realism, and high-definition generation efficiency.
Key highlights include:
MoE Architecture for Higher Model Capacity
Wan2.2 introduces a Mixture-of-Experts (MoE) structure into video diffusion, enabling larger effective model capacity without increasing computational cost.
Cinematic-Level Aesthetic Control
With carefully curated aesthetic datasets labeled by lighting, composition, contrast, and color tone, Wan2.2 enables highly controllable cinematic-style video generation.
Stronger Complex Motion Generation
Trained on substantially larger datasets (+65.6% images, +83.2% videos vs. Wan2.1), Wan2.2 achieves top-tier performance in motion realism, semantic accuracy, and aesthetic quality.
Efficient 720P Hybrid Text & Image to Video (TI2V)
The open-sourced 5B model with Wan2.2-VAE supports both Text-to-Video and Image-to-Video at 720P, 24fps, runs on consumer GPUs like RTX 4090, and ranks among the fastest HD video models available.
Advanced I2V-A14B Image-to-Video Model
Built with MoE architecture, the I2V-A14B model supports 480P and 720P I2V generation with more stable motion, fewer unrealistic camera movements, and stronger performance for stylized scenes.
Wan2.2 S2V-14B
Wan2.2 Animate-14B
Wan 2.1 Open Source Models
Wan2.1 is a comprehensive and open suite of video foundation models that significantly advances the capabilities of Wan AI Video Generator.
Key highlights include:
State-of-the-Art Performance
Wan2.1 achieves top-tier performance across multiple benchmarks, outperforming most open-source video models and rivaling leading commercial solutions.
Consumer GPU Compatibility
The T2V-1.3B model runs on as little as 8.19 GB VRAM, enabling high-quality video generation on mainstream consumer GPUs such as the RTX 4090.
Full-Stack Multi-Task Support
Wan2.1 supports Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, delivering a complete multimodal video generation pipeline.
Bilingual Visual Text Generation
As the first video model capable of generating both Chinese and English on-screen text, Wan AI 2.1 expands real-world creative and commercial use cases.
High-Performance Wan-VAE
Wan-VAE enables efficient encoding and decoding of 1080P videos of any length while preserving temporal consistency, serving as a robust foundation for video and image generation.
T2V-14B Flagship Model
The T2V-14B model sets a new SOTA benchmark across open and closed models, excelling in dynamic motion generation and supporting 480P and 720P bilingual video output.
Wan 2.6 has been officially released
Bringing a major leap forward in AI video generation
15-Second Long-Form Video Generation
Unlock extended creative storytelling possibilities for creators, filmmakers, and marketers with 15-second long-form video generation.
LoRA Fine-Tuning Support
Customize characters, styles, and motion behaviors with lightweight training—making personalized AI video creation faster and more accessible than ever.
Enhanced Character Consistency
Greatly strengthened character consistency, ensuring stable identities, facial features, and motion continuity across longer video sequences.
Native AI Music Generation
Wan AI music generation will be natively integrated, allowing seamless synchronization of visuals and sound within a single creative workflow.
Wan AI Frequently Asked Questions
What is Wan Video by Wan AI and how does it work?
Wan Video is a state-of-the-art video generation system developed under the Wan AI framework. It transforms text or image inputs into high-quality videos using advanced technologies such as Variational Autoencoders (VAE) and Diffusion Transformers (DiT), delivering realistic motion, cinematic visuals, and accurate physical behavior.
Do I need technical expertise to use Wan AI?
No technical background is required. Wan AI is designed with a user-friendly interface that allows beginners and professionals alike to generate high-quality videos easily without coding or complex configuration.
What types of videos can I create with Wan Video?
Wan Video supports a wide range of video content, including character animation, dancing, sports, cinematic storytelling, educational content, marketing videos, historical restoration, and stylized creative scenes.
How long does it take to generate a video by Wan AI?
Video generation time depends on resolution, duration, and motion complexity. Higher-performance versions of Wan AI offer faster processing speeds for time-sensitive production needs.
Can I customize the video output with Wan AI?
Yes. Wan Video allows flexible control over resolution, frame rate, motion intensity, camera movement, visual style, and more—giving you full creative control over the final result.
What input formats does Wan Video support?
Wan Video currently supports text-to-video and image-to-video generation. Users can provide detailed text prompts or reference images to guide scene composition, motion, and visual style.
Does Wan AI support multilingual video generation?
Yes. Wan AI supports multilingual text prompts, including English and Chinese. Video content and on-screen visual text can be generated based on different languages depending on the selected model.
Is there a limit to the length of videos generated by Wan AI?
Video length limits depend on the platform plan and model version. Entry-level access may have shorter duration limits, while advanced plans support longer, more complex video generation.
How does Wan Video ensure high-quality output?
Wan Video leverages advanced VAE and DiT architectures, large-scale training datasets, and optimized motion modeling to ensure cinematic visuals, smooth transitions, realistic physics, and stable temporal consistency.
How does Wan Video handle complex scenes with multiple characters?
Wan Video analyzes character relationships, spatial positioning, and motion interactions from the input prompt, ensuring natural movement, realistic interactions, and consistent multi-character behavior.
What open-source models are currently available from Wan AI?
Wan AI has open-sourced multiple models, including high-definition Text-to-Video and Image-to-Video models, as well as specialized MoE-based architectures for stable motion generation and stylized video synthesis.
What other open-source AI models has Alibaba Cloud released related to Wan AI?
Alibaba Cloud has released a broad ecosystem of open-source AI models, including Qwen large language models, multimodal vision-language models, image generation models, and audio generation systems—forming a complete multimodal AI infrastructure alongside Wan AI.
What is Wan Image by Wan AI and what can it be used for?
Wan Image is the image generation and editing system under the Wan AI framework. It supports text-to-image creation, high-resolution visual rendering, commercial-grade design output, and creative illustration across advertising, e-commerce, branding, gaming, and digital art production.
Does Wan Image support professional image editing and style control?
Yes. Wan Image supports advanced image editing features such as inpainting, outpainting, object removal, background replacement, super-resolution enhancement, and multi-style transfer. Users can precisely control realism, artistic styles, lighting, and composition for professional creative workflows.
